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Abstract 

In this article, we study the approximation of a probability measure 
H on R d by its empirical measure /tjv interpreted as a random quantiza- 
tion. As error criterion we consider an averaged p-th moment Wasserstein 
metric. In the case where 2p < d, we establish refined upper and lower 
bounds for the error, a high-resolution formula. Moreover, we provide a 
universal estimate based on moments, a so-called Pierce type estimate. In 
particular, we show that quantization by empirical measures is of optimal 
order under weak assumptions. 

Keywords. Constructive quantization, Wasserstein metric, transportation prob- 
lem, Zador's theorem, Pierce's lemma, random quantization. 
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1 Introduction 

Constructive quantization is concerned with the efficient computation of discrete 
approximations to probability distributions. The need for such approximations 
mainly stems from two applications: firstly from information theory, where the 
approximation is a discrctized version of an original signal which is to be stored 
on a data storage medium or transmitted via a channel (see e.g. |Zad661 IBW82 . 
GG92]); secondly, from numerical integration, where integrals with respect to 
the original measure are replaced by the integral with respect to the discrete 
approximation (see e.g. [PPP03) ). 

In both applications the objective is to find an optimal discrete subset of a 
metric space (E, d) of cardinality N say, a so-called codebook, depending on the 
given probability measure fi on E. In the first application one further needs fast 
coding and decoding schemes that find for a signal a digital representation of a 
close element of the codebook or, resp., translate the digital representation back. 
Clearly, the best coding scheme would map a signal to a digital representation 
of a closest neighbour in the codebook. The quantization number measures the 
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smallest possible averaged distance of a ^-distributed point to the codebook and 
hence the performance of the best possible approximate coding of \x using N 
approximating points which corresponds to using log 2 N bits. 

During the last decade, quantization attracted much interest mainly due to 
the second application, see for instance IGBllj for a recent review on financial 
applications. Here one aims at finding a codebook together with probability 
weights and the objective is to determine these in such a way that the distance 
between fj, and the discrete probability measure is minimal with respect to some 
metric (e.g. a Wasserstein metric). Typically, the optimal solution of both prob- 
lems are closely related. The optimal codebook of the first problem is also opti- 
mal for the second one and the optimal probability weights are the /z-weights of 
the corresponding Voronoi cells. In particular, the optimal approximation errors 
are again the quantization numbers. A regularly updated list of articles dealing 
with quantization can be found at http://www.quantize.matris-fi.com/. 

From a constructive point of view, the two applications differ significantly 
and our research is mainly motivated by the second application. For moderate 
codebook sizes and particular probability measures it is feasible to run opti- 
mization algorithms and find approximations that are arbitrarily close to the 
optimum (see e.g. |Pag98[ |PP03] V See also [MGRYll] for a recent constructive 
approach towards discrete approximation of marginals of stochastic differential 
equations. For large codebook sizes and probability measures that are defined 
implicitly, it is often not feasible to find close to optimal quantizations in reason- 
able time. For instance, large codebooks are necessary when using quantizations 
for approximate sampling. 

As an alternative approach we analyze the use of the empirical measure /ijv 
generated by N independent random variables distributed according to the orig- 
inal measure fi. As error criterion we consider an averaged LP- Wasserstein met- 
ric. We stress that in our case the codebook is generated by i.i.d. samples and 
that the weights all have equal mass so that once the codebook is generated no 
further processing is needed. The advantage of using the empirical measure as 
a discrete approximation of fi is that it is usually easy to generate efficiently 
even for large N. The disadvantage is, of course, that for given N, the averaged 
Wasserstein distance between /i and /tjv is larger than that between /i and the 
optimal probability measure supported on N points. We will show that in the 
case E = M. d equipped with some norm (which is the only case we consider in 
this article), the loss of performance is a multiplicative constant. While the 
empirical measure turns out to be a reasonable approximation that can be com- 
puted efficiently, the analysis of its performance is complicated by the fact that 
the problem is nonlocal due to the fact that we take equal weights rather than 
optimal weights as in |Coh04, Yuk08[ (see the following subsection). 

A full treatment of quantization typically includes the derivation of asymp- 
totic formulas in terms of the density of the absolutely continuous part of \i, 
a high resolution formula. Such a formula has been established for optimal 
quantization under norm-based distortions [DGLP04 , for general Orlicz-norm 
distortions [DVllj . and, very recently, also in the dual quantization problem 
[PWlOj . In this article, we prove a high resolution formula for the empirical 
measure under an averaged L p -Wasserstein metric. Further, a Pierce type re- 
sult is derived. In particular, we obtain order optimality of the new approach 
under weak assumptions. 

The article is organised as follows. Section [T] introduces the basic notation 



2 



and summarizes the main results. Section[5]is devoted to the Pierce type result, 
see Theorem[T]below. Section[3]treats the particular case where /i is the uniform 
distribution on [0, l) d . It includes a proof of part (i) of Theorem[5]below. Finally, 
the high resolution formula provided by Theorem [2] is proved in Section 0] 



1.1 Notation 

We introduce the relevant notation along an example. Consider the following 
problem arising from logistics. There is a demand for a certain economic good 
on R 2 modelled by a finite measure fi. The demand shall be accomodated by 
N service centers that are placed at positions x%,... ,xn € R 2 and that have 
nonnegative capacities pi, . . . ,f>jv summing up to \\p,\\ := /i(R 2 ). We associate 
a given choice of supporting points x\,...,Xn and weights pi,...pN with a 
measure fi — J2i=iPi^xn where S x denotes the Dirac measure in x. In order 
to cover the demand, goods have to be transported from the centers to the 
customers and we describe a transport schedule by a measure £ on M 2 x R 2 such 
that its first, respectively second, marginal measure is equal to //, respectively 
ft. The set of admissible transport schedules (transports) is denoted by M([i, ft) 
and supposing that transporting a unit mass from y to x causes cost c(x,y), a 
transport £ € Ad(fi,p>) causes overall cost 

c(x,y) d£(x,y). 

cR 2 

In this article, we focus on norm based cost functions. In general, we assume 
that the demand is a finite measure on R d and that the cost is of the form 

c(x,y) = \\x - y\\ p , 

where p > 1 and || • || is a fixed norm on R d . Given /x and /i, the minimal cost 
is the pth Wasserstein metric. 

Definition 1 (pth Wasserstein metric) The pth Wasserstein metric of two 
finite measures /i and v on (R d , S(R d )) , which have equal mass, is given by 

/ r \ Vp 

p p (fi,u)= inf / \\x-y\\ p £(dx,dy) 

where M.(pb,v) is the set of all finite measures p on R d x R d having marginal 
distributions n in the first component and v in the second component. 

The Wasserstein metric originates from the Monge-Kantorovich mass trans- 
portation problem, which was introduced by G. Monge in 1781 Mon8l]. Im- 
portant results about the Wasserstein metric were achieved within the scope of 
transportation theory, for instance by Kantorovich [Kan42j . Kantorovich and 
Rubinstein KR58J, Wasserstein jWas69j . Rachev and Riischendorf |RR98a, 
IRR98bj and others. 

Note that the Wasserstein metric is homogeneous in (/i, v) so that one can 
restrict attention to probability measures. In this article, we analyse for a given 
probability measure /i on M. d the quality of the empirical measure as approxi- 
mation. More explicitly, we denote by (xn the (random) empirical measure of 
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N independent yu-distributed random variables X\, . . . , Xn, that is 

JV 

1 



1 



TV ■ 

and, for fixed p > f, we analyse the asymptotic behaviour of the so called 
random quantization error 

V^ p d (p) :=E[p£( iU ,/} J v)] 1/p , 

as N <G N tends to infinity. 

This quantity should be compared with the optimal approximation in the 
L p -Wasserstein metric supported by N points, that is 

V N P p(v) '■= inf Pp{v, v), (1) 

where the infimum is taken over all probability measures v on M. d that are 
supported on N points. The quantity V^(^l) is local in the sense that for a 
given set C C M. d of supporting points used in an approximation v, the optimal 
choice for v is /i o n^ 1 , where -kq denotes a projection from R d to C. Hence, 
the minimisation of the latter quantity reduces to a minimisation over all sets 
C C M d of at most TV elements. Furthermore, the minimal error is the so called 
Nth quantization number 



V^>) = inf ( J mm \\x - y\\> M (dar)) 



i/p 



For a measure \i on M. d we denote by n = fi a + fJ. s its Lebesgue decomposi- 
tion with fi a denoting the absolutely continuous part with respect to Lebesgue 
measure X d and jj, s the singular part. 

Further, we denote the uniform distibution on [0, l) d by U and define 



inf / \\x — y\\ p £ (dx, dy) 



5eA4(w',Wjv) 



i/p 



where A denotes the set of all probability measures W on [0, l] d which satisfy 
U'{A) < U{A) for each Borel set A C (0, l) d . Note that the latter quantity allows 
to have leakage in the boundaries of the support of the uniform measure U. 
Therefore, < V^f{U). It seems plausible that the ratio of and 

V™£ d (l{) converges to one as N — > oo. However, this has not been proved yet. 

1.2 Main results 

We will assume throughout the paper that d > 3. The approximation by em- 
pirical measures satisfies a so-called Pierce type estimate. 

Theorem 1 Let p G [l, f ) and q > -^ ) . There exists a constant Kp^ rcc such 
that for any probability measure (i on K d 



for all N e N. 



||x|| 9 d/i(x) 



1/9 

AT-i/d ( 2 ) 



4 



Remark 1 • The constant in the statement of Theorem [T] is explicit, see 
Theorem [31 Its value depends on the chosen norm on M. d . 

• For p > | and discrete measures /i, the random approach typically induces 
errors V^ a p d (/i) that are not of order 0{N~ 1 ^ d ): take, for instance, two 
different points a, b £ K d and let fi = \8 a + hSb- Then N fijy({a}) is 
binomially distributed with parameters N and ^. Consequently, 

V^ d ( M ) = E[^(M, Ajv)] 1/p = ||a - 6|| E[MM) ~ i|] VP 

is of order N~ 1 ^ 2p and, hence, converges to zero strictly slower than iV~ 1 / d . 

• In [AKT83 , the case where d = 2, p = 1 and \i = U is treated. There 
it is found that the L 1 -Wasserstein distance between two independent 
realisations of Wat is typically of order iV _1 / 2 (log TV) 1 / 2 which shows the 
necessity of the assumption d > 3. 

• For the uniform distribution U on [0, l) d , the results of Talagrand ;Tal94 
imply that V^p d (U) is always of order 7V _1 / d as long as d > 3. 

The following theorem is a high resolution formula for quantization by em- 
pirical measures. 

Theorem 2 Let p £ [1, |). 

(%) Let U denote the uniform distribution on [0, There exists a constant 
< nif € (0, oo) such that 



N 



lim TV 1 ^ V^p d (U) = Kp ni 



Further, there exist a constant Kp £ (0,oo) such that 



lim N 1 ^ V^ nd = < nif . 



(ii) Let ii be a probability measure on M. d that has a finite qth moment for some 
q > -j^p- and suppose that -j^f is Riemann integrable or p — 1 . Then 



l—E \ i/p 

hmsupiVV^-^ < <f ^ ^ ^ - dA , (3) 



and 



K^'^w^- SdA ")' / '' (4) 

Remark 2 We conjecture that k>p — ftp in which case the inequality and 
lim sup in are actually an equality and lim. Proving the equality Kp nli = Kp nlf 
seems to be a general open problem in transport problems. Similar problems 
arise in Huesmann and Sturm in [HSlOj for optimal transports from Poisson 
point processes with Lebesgue intensity to Lebesgue measure. 
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Let us compare our results with the classical high resolution formulas, see 
[GLQ01 Theorem 6.2]. The asymptotics of defined in (JXJ) satisfies 

^'vglM-^l ' + "■ <» 

whenever /i has a finite moment of order q for some q > p. Here, the constant 
c Pt d is the corresponding limit for the uniform distribution on the unit cube in 
M. d . Its numerical value is known in a few special cases. 

Theorem[T]can be used to improve [GL001 Theorem 9.1(a)]: there the validity 
of an asymptotic formula for the random quantization error is shown to be 
equivalent to the uniform integrability of (N p l d mini<i<Ar — Yi\\ p )N>i where 
(for example) X, Yy,... are independent with law fi. Theorem Q] shows that 
uniform integrability holds provided that 1 < p < d/2 and /i has a finite moment 
of order q for some q > 

Note that the integral term on the right hand side of (JSJ differs from the 
one in Q and (j4]). This effect can be explained as follows: for a sequence of 
optimal codebooks (C(N))n>i of size N the empirical measures jj ^2 xe c(N) 
tend to a measure that differs from ji. In fact optimal codebooks allocate more 
points in the tails of the distribution. Since our approach does not account for 
such a correction, it is natural to expect a loss of efficiency for heavy tailed 
distributions. For arbitrary codebooks whose empirical distributions tend to 
the measure [a, one has lower bounds which incorporate the same integral term 
as in our high resolution formula, see |Der091 Thm. 7.2]. 

A high resolution formula is also available for quantization with random 
codebooks and optimally chosen weights. As a consequence of |GL0Q[ Theorem 
9.1(a)] and Theorem [I] one has under the assumption of Theorem [2] (without 
the Riemann integrability) equality in (j3|) for a different constant. Indeed, The- 
orem [T] allows to verify an integrability assumption in [GLOOj Theorem 9.1(a)] 
and thus to improve the result. As a consequence, postprocessing of the weights 
can in the limit improve the error by a constant factor, irrespective the distri- 
bution /i. 

1.3 Preliminaries 

For a finite signed measure /i on the Borcl sets of M. d , we write \\fj,\\ := \/.i\(M. d ) 
for its total variation norm (using the same symbol as for the norm on K d should 
not cause any confusion). For finite (nonnegative) measures [i and v we denote 
by [i A v the largest measure that is dominated by fi and v. Furthermore, we 
set (/i — v)+ := [i — fi A v. 

Next, we introduce concatenation of transports. A transport £, i.e. a finite 
measure £ on M. d x R d , will be associated to a probability kernel K and a measure 
v on M. d via 

Z(dx,dy) = v(dx)K{x,dy), (6) 

so v is the first marginal of £. We call £ the transport with source v and kernel 
K. Let K. denote the set of probability kernels from (M. d , B d ) into itself and 
consider the semigroup (JC,*), where the operation * is defined via 

K 1 *K 2 (x,A) := [ K 1 (x,dz)K 2 (z,A) (x e R d , A e B d ) 
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Now we can iterate transport schedules: Let Vq, . . . , v n be measures on M. d with 
identical total mass and let £ Ai(vk-i> Vk)- Then the concatenation of the 
transports £1, . is formally the transport described by the source vq and 

the probability kernel K = K\ * ■ ■ ■ * K n , where K\ , . . . , K n are the kernels 
associated to £1 , . . . , £„. Note that the relation ([6]) defines the kernel uniquely 
up to ^-nullsets so that the concatenation of transport schedules is a well-defined 
operation on the set of transports. In analogy to the operation * on /C, we write 
£,1 * ■ ■ ■ * £,n for the concatenation of the transport schedules. 

We summarize elementary properties of the Wasserstein metric in a lemma. 

Lemma 1 Let £, u, u±, . . . and v,v\^.., be finite measures on M. d such that 

U\\ = INI = HI- 

(i) Convexity: Suppose that [i = ^ZfeeN^k cind v = X^fceN Vk anc ^ that for 
all k € N, ||/ifc|| = \\vk\\ • Then 

oo 

f$0*,")<Y,f$(Mk,i>k)- (7) 
fc=i 

(ii) Triangle-inequality: One has 

Pp{v,v) < p p (fi,Q + /0 P (£,f). (8) 

( Hi ) Translation and scaling: Let T : M. d — > K d be a map, which consists of 
a translation and a scaling by the factor a > 0. Then 

p p (fioT- 1 ,voT- 1 ) = ap p {p,,v). (9) 



2 Proof of the Pierce type result 

In order to prove Theorem[TJ we first derive an estimate for general distributions 
on the unit cube [0, l) d . 

Proposition 1 Let 1 < p < | . There exists a constant K™ bc <E (0, oo) such 
that for any probability measure p on [0, l) d and N E N 

V^ d (/i) <Kl uhc N-i. 

Remark 3 The constant Kp Uhe is explicit. Let d — sup X)ye r 0jl \d \\x ~ y\\ denote 
the diameter of [0, l) d . Then 



K cu be = 5 2 — 



+ 



1 - 2? 2 1 - 2-P 



For the proof of Proposition [T] we use a nested sequence of partitions of 
B = [0, l) d . Note that B can be partitioned into 2 d translates B\ , . . . , B 2 d of 
2~ 1 B. We iterate this procedure and partition each set B k into 2 d translates 
Bk.i, ■ ■ ■ , B k 2 d of 2~ 2 B. We continue this scheme obeying the rule that each set 
Bk lt ...,ki i s partitioned into 2 d translates B kl ,...,k l ,i, ■ ■ ■ , B kl _^ kl2 d of 2~( l+1 ^B. 
These translates of 2~ l B form a partition of B and we denote this collection of 
sets by Vi, the Zth level. We now endow the sets V := Uto ^ w ^ n a 2 d ary tree 
structure. B denotes the root of the tree and the father of a set C £ Vi (I G N) 
is the unique set F € Vi-i that contains C. 
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Lemma 2 Let /i and v be two probability measures supported on B such that 
for allC eV 

v{C) > => fi(C) > 0. 

Then 

AC) 



i=0 FGP; CchildofF ' v 7 

with the convention that ^ = 0. 

For the proof we use couplings defined via partitions. Let (A k ) be a (finite 
or countably infinite) Borel partition of the Borel set A C R d . For two finite 
measures fi\, \i2 on A with equal masses, we call the measure v defined by 

I _ yi(A k ) | 
VIa * ~ Vi(A k )^» 

the (A k )- approximation of fi\ to /i2 provided that it is well defined (i.e. that 
Hi{A k ) = implies fj,s(A k ) = 0). 

The (Afc)-approximation v is associated to a transport from fii to v. Note 
that 

/ a \i Ati(Afc) AAt 2 (A fc ) , 

and we define a transport (eM (/xi , i^) via 

£ = (Mi A i/) o + i(/i X - 2/) + ® (i/ - 
o 

where 5 := \ £ fc \fJ>i(A k ) - ^ 2 {A k )\ and i/j:R d ^R d x R d ,x h-> (x,x). Then 
Proof of Lemma [2j For I 6 No, we set 

which is the ^-approximation of /x to v. By construction, one has for each set 
F e Vi with I e N 

m(F) = m +1 (F). 
Moreover, provided that ni(F) > 0, one has for each child C of F 

so that is the {C £ T^+i : C c Aj-approximation of ni\p to . Hence, 

there exists a transport £p g .M(/J;|.f, M/+i|-f) with 

£ F ({(x, 2/ ):^y}) = i £ l^-K^^Sl- (10) 

C child of F t \ ) 
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Since each family Vi is a partition of the root B, we have 
6+1 := E & e M(fii,m + i). 



FeV, 

>-/ 



Next, note that p p {pi,v) < c)2 'so that ^/ converges in the pth Wasserstein 
metric to v which implies that 

P P (p,v) < sup p p (p,pi). (11) 
The concatenation of the transports (£z);gN leads to new transports 

f' = £i *•••*$ eM(ji,Ln) 

Each of the transports £fc is associated to a kernel -fTfc and, by Ionescu-Tulcea, 
there exists a sequence (Z;); e N of [0, l) d -valued random variables with 



)p(dx ) 



P(Z eA ,...,Z l eA l ) = [ f ... f K l (x^ 1 ,A l )...K 1 (x ,dx, 

JAqJAx JAi-i 

for every I G N. Then the joint distribution of (Zq, Z{) is Let 

L = inf{/ g N Q : Z l+1 + Z t } 

and note that all entries (Z;); 6 pj lie in one (random) set A G "Pl, if < oo} 
enters, and are identical on {L = oo}. Hence, for any A; G N 

oo 

E[\\Zq - Z k \\"} < ^E[2-p l ] < d p ^2-p'P(Z ;+1 + Zi) 

1=0 



= rJ22-^ l+1 ({(x,y):x^y}) 



1=0 



^E^'E E \v{G)-v(F) 



p(C) 



1=0 FGVi C child of F 

where we used ([TU]) in the last step, so the assertion follows by ([TTT) . □ 
Proof of Proposition [TJ We apply the above lemma with v = /tjv- Hence, 



p p p (p, m < ^ E 2 ~ pl E E - 



p{C) 



1=0 FeV, C child of F 



(12) 



Note that conditional on the event {N pn(F) = k} (k G N) the random vector 
(Np,N(C))c child of f is multinomially distributed with parameters k and success 
probabilities (p(C)/p(F))c child of F- Hence, 



E 



[ £ 

C child of F 



p N (C)-p N (F)^ Np N (F) = k 
p{F) 



Np N (C)-k^ Np N (F) = k 



44 e 

C child of F 

< -J: E var(AT/iiv(C)| |A^A2v(-F) = fc 



1/2 



C child of F 



C child of F 



M (F) - TV ' 
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where we used Jensen's inequality in the last step. We set = s/tAt (t > 0) 
and observe that 



E 



[ Yl \»n(C)-(l n (F) 



< ^C(iV/x(F)). 



C child of F 

Consequently, it follows from (|12[) and Jensen's inequality that 

-.OO ~ d OO 

i=o FeVi 1=0 
Let I* := Llog 2 iV3j. Then, 

/* OO 

1=0 i=i*+i 
00 00 

< O^- 1 ^- 1 [E 2^-^ l '-^VN + 2-^ r+1 ) E 2 " 

fc=0 

< V2i- X N-^ \ L_ ^ + 



E[ 



'A' 



so the assertion follows. 



□ 



We are now in the position to prove Theorem [TJ Since all norms on M. d are 
equivalent, it suffices to prove the result for the maximum norm ||.|j ma x- 

Theorem 3 Let p G [1, f ) and q > -j^- One has for any probability 
fi on R d that 



measure 



NIL* Mx) 



1/9 



N -l/d 



(13) 



2"-^^ , 2 p +" (1 ' p/d) (^ ubC ) P 



1/p 



Proof. By the scaling invariance of inequality ([13)) . we can and will assume 
without loss of generality that J \\x\\1 uax dp(x) = 1. Wc partition !L d into a 
sequence of sets (i? n )nGN defined as 

Bo := S := [-1, l) d and S n := (2 n B)\(2 n-1 £) for neN. 

We denote by ^ the random (£>„)-approximation of n to /xjv, that is 

£in{B„ 



-/i| B „ for n e N . 
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Then£ = (nAv)oip- 1 + 6- 1 (p-v)+ ®{v- n)+ with S := \(n~v) + \ = \(v-^) + \ 
and ijj : M. d — > M. d x M. d , x H> (a;, x) defines a transport in u), such that 



x - y\\ p H&x, Ay) =5^ / \\x - y\\ p - u) + (dx) {v - fi) + (dy) 

JR d JR d 

<2 P ^ [ \\x\\ p ( f ,-is)+(dx)+2 p - 1 [ \\y\\ p (u-fx) + (dy) 

OO „ oo „ 

< 2'- 1 X! / (a* - ^) + (dx) + 2'- 1 £ / \\y\\ p iy v) + (dy) 

oo 

<2P- 1 Y / * P 2 np -\v-v\(Bn)- 

n=0 

Note that Njj,^(B n ) ~ Bin(7V, /j,(B n )) and that by the Markov inequality 

< 2-"("- 1 )| Hxll^dM^) = 2^(- 1 ). (14) 
The inequality remains true for n — 0. Thus 

oo 

W'")] < Y, 2P ~ l2nP * PE MBn) - fi N (B n )\] 

oo 

< ^2 ?, - 1 2" p p 7V-^(B„)5 (15) 

n=0 

oo 2 P+f-l 

< 2 p+ ^- 1 d p N-^ V 2™ (p -59) = qpn-^. 

^ 1 _ 2P-2? 

n— 

It remains to analyse E[p^(^, /ijv)]- Given that {N£in(B„) = k} the random 
measure ^-/W|.B ra is the empirical measure of k independent ^b") -distributed 
random variables. By Lemma Q](i) and Proposition [TJ 

oo 

E[f$(v,ji N )] < ^E^Hb^IbJ] 



n=0 
oo oo 



n=0 fe=l 

Using that E [/i/y (£?„)] = [i(B n ), we conclude with Jensen's inequality that 



i-p/d 

We use again inequality (fl"4)) to derive 



Wj? p {v,jl N )] < ( K ™bc^-p/d^ 2 (n+i)p- 9 («-i)(i-£) 

n=0 
2P+g(l-p/d) 

2-«( 1 -p/ c 0+p 



(^ Ubc ) p i ^ '— ",' n . N' p/d . 
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Note that | < \ and altogether, we finish the proof by applying the triangle 
inequality (property (ii) of Lemma [T]) and equation (1151) to deduce that 



1 - 2P~ 



1 — 2~'J( 1 ~P/ d )+P 



i/p 



AT-V<i. 



,-Picr 



□ 



3 Asymptotic analysis of the uniform measure 

Next, we investigate the asymptotics of the random quantization of the uniform 
distribution U on the unit cube B = [0, \ ) d . The aim of this subsection is to 
prove the existence of the limits 

< nif := lim N l ' d V^l d (U), nl nii := lim N 1 ' 4 V 1 ^ 
which is the first statement of Theorem [5J 

Notation 1 Let A and 5" denote two sets with A C S and suppose that v — 
{vj)j=i,...,N is an S- valued vector. We call the vector va consisting of all entries 
of v in A the A-subvector of i>, that is 

v A := (v lU) ) 

where is an enumeration of the entries of v in A. 

For a Borel set A with finite nonvanishing Lebesgue measure, we denote by 
U{A) the uniform distribution on A. The proof of the existence of the limit 
makes use of the following lemma. 

Lemma 3 Let K £ N and let A, A\, . . . , Ak C M. d be Borel sets such that 
X d (A) < 00 and that the sets A\,.. . ,Ak C M. d are pairwise disjoint and cover 

A. Fix N eN and suppose that £ k := N ■ A "^ ( ^ ) € N for k = 1, . . . , K . 

Assume that X — (X\, . . . ,Xn) is a random vector consisting of indepen- 
dent U (A) -distributed entries. Then one can couple X with a random vector 
Y = (Yi, . . . , Yjv) which has A^-subvectors consisting of independent lA{Ak)- 
distributed entries such that the individual subvectors are independent and such 
that 

i=i 



< -^7— ■ (16) 



Proof. For k = 1,...,K, denote by X^ the Afc-subvector of X. For each 
k with £fc < length(X( fc )), we keep the first entries of X in Ak and erase 
the remaining ones. For any other fc's, we fill up 6c - length(XW) of the 
empty places by independent W(Afc)-distributcd elements. Denoting the new 
vector by Y, we see that Y has Ak -subvectors of length Clearly, Y has 
independent subvectors that are uniformly distributed on the respective sets. 
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Since the length of the Afc-subvector is binomially distributed with parameters 

at j \ d (AinA) . 

N and q k := ; , we get 

JV K K 

E[j2hx^Y j} ] = -E[^|lcngth(X( fe ))-a|] < - £var(length(X«)) 1/2 

3=1 k=l i=l 

1 K 1 

k=l 



a 



Proof of the first statement of (i) of Theorem [2j Let M E N be arbitrary 
but fixed. Further, let TV e N, TV > 2 d M, and denote by B = [0, a) d , a' 



d „d _ M 

AT ' 

the cube with volume A d (-Bo) = 77- • We divide [0, l) d into two parts, the main 
one £ main := [0, [l/a\ a) d and the remainder B lcm := [0, l) d \B main . Note that 
X d (B Icm ) -> as X -> 00. We represent 73 main as the union of n- = La _1 J d 
pairwise disjoint translates of £>o: 

Let X = (Xi, . . . , Xtv) denote a vector of N independent U[0, l) d -distributed 
entries. We shall now couple X with a random vector Y = (Yi, . . . , Y/v) in such 
a way that most of the entries of X and Y coincide and such that the Bj.- 
subvectors are independent and consist of M independent W(i?fe)-distributed 
entries. To achieve this goal we successively apply Lemma|3]to construct random 
vectors X°, . . . , X L and finally set X L = Y. First we apply the coupling for X 
with the decomposition [0, l) d = B main U B rora and denote by X° the resulting 
vector. In the next step a 2 d ary tree T with leaves being the boxes Bi, . . . B n 
is used to define further couplings. We let L denote the smallest integer with 
2 L B D B main , i.e. L=\— log 2 a] , and set 

71 := {7 + 2 L - l B : 7 E {2 L - l aZ d ) n B main } 

for I = 0, . . . , L. Now T is defined as the rooted tree which has at level I the 
boxes (vertices) 71 and a box A c hiid € 71 is the child of a box A paxen t € 71- 1 if 

^ ^.parent • 

We associate the vector X° with the 0th level of the tree. Now we define 
consecutively X 1 , . . . ,X L via the following rule. Suppose that X 1 has already 
been defined. For each A E 71 we apply the above coupling independently to 
the A-subvector of X 1 with the representation 

A = II 73. 

child of A 

By induction, for each A E 71, the A-subvector of X ( consists of XA d (A) € N 
independent W (A)-distributed random variables. In particular, this is valid for 
the last level Y = X L . 

We proceed with an error analysis. Fix u £ fl and j E {1,...,X} and 
suppose that Xj(uj), . . . , Xj(ui) is altered in the step Z — » £ + 1 for the first time 
and that X°(u;) E B eTi- Then it follows that X^(w) e B so that 

||X°(w) - Xf (w)|| < diameter(B) < a32 i_ ', 
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where D is the diameter of [0, l) d . Consequently, 

N N L-l 



3=1 



< E [EE 1 {xj^}M2 I '- i ) p 



3=1 i=0 

By Lemma [3] and the Cauchy-Schwarz inequality, one has, for I = 1, . . . , L, 



E 



N 

[E Vj^" 1 } 



3=1 



2 A67I-! 2 



Together with the former estimate we get 

N , L 



3=1 



~ 2 v ; ^ ~ 2 i _ 

i=i x ^ 



AT 



Next, we use that a = (#) 1/d and 2 L < - to conclude that 

5 V JV ' — a 

2 d/2-lj,p 

ii a ; -A/irj < T 

"3=1 



JV 



1 _ 



_ ,£ 1 _ _1 P 



Hence, there exists a constant C that does not depend on TV and M such that 



E 



1 N 



3=1 



1 JV 1 JV 

<E[l^||X,-^ r ] +e[15>« 

3=1 3=1 



i/p 



(17) 



By construction, Y has for each k = 1, . . . ,n, a -Bfc-subvector of M inde- 
pendent W(Bfc) _ distributed random variables and we denote the corresponding 
empirical measure by fi M . Morever, its _B rcm -subvector contains N — nM in- 
dependent W(i? rcm )-distributed entries and we denote its empirical measure by 
Ajv-nM- Letting jr^ denote the empirical measure of Y, we conclude with 
Lemma [1] and Proposition [T] that 

n 

ATE[p£(££,W)] <^MK\f p <j$ ,U(B k ))] + (N nM) E[ P ;(^ nM , U(B^))] 

k=l 

< nMa p {Vltl d {U)Y + ( K c p uhc ) p (N - nM) 1 - p l d . (18) 



Next, we let N tend to infinity and combine the above estimates. Note that 
iVV<* ffl M x l d and ^ ^ 1 so that 

limsupjV 1 / d E[pP( / i]v,W)] 1/p < M^VJ^iU). 

JV->oo 

Moreover, (fTTl) implies that 

limsu P JV 1/d E[pP(Aw,Aw)] 1/p < CM-^-^K 

JV->oo 
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Now fix e £ (0, 1] arbitrarily and let M > i such that 

M 1//d Vlff(U) < liminf A^ 1 ^ V£?£ d («) + e. 



N 

Then 

limsupiV 1 / d V^ d (W) < M^^fiU) 

N-too 

< liminf N 1 ^ VL™ d (U) +£ + C £ i^ 

and letting e | finishes the proof. □ 

Proof of the second statement of (i) of Theorem [2j The proof of the 
second statement is very similar to the proof of the first statement. The crucial 
difference is that the arguments are now based on superadditivity compared to 
the subadditivity of the Wasserstein metric (in the sense of part (i) of Lemma 
[T|) that was used in the proof of the first statement. 

We now look at a nonsymmetric modified version of the Wasserstein distance 
that allows leakage at the boundaries. For two probability measures v\ and v 2 
on [0, we define 

p {v x ,v 2 ) := mf p p {v[ 1 v 2 ), 

-P ^GA(n) 

where A(i/i) denotes all probability measures £ on [0, l] d which satisfy C(^) < 
v x (A) for all Borel sets A in (0, l) d . 

We make use of thee same notation as in the proof of the first statement. 
First note that similar as in (fT5|) 

Nn^{U,p Y N )]>nMaP{V^Y 

Since, in general, 



pJMiPn) < P p (U, Pn) + Pp(Pn, Pn) 



we conclude that 



liminf N 1 / d E[p p n (U,fi%)} 1/p > liminf N 1 / d ^£{U,p%)] 1/p - limsup jV 1/d E[p£(^, p Y N )] 1/p 



iV->oo -P N- 



N- 



> ^\xdN x ' d {nM/N) x / p a\r^{U.) -CM~^~^ 

N-^-oo 

The proof is finished as above. □ 



4 Proof of the high resolution formula 

4.1 Proof of the high resolution formula for general p 

Definition 2 We call a finite measure p on M. d approachable from below, if there 
exists for any e > a finite number of cubes B\ , . . . , B n (which are parallel to 
the coordinate axes) and positive reals ai,...,a n such that v := a kU(Bk) 
satisfies 

v < p and ||/i — v\\ < e. 
The term approachable from above is defined analogously. 
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Remark 4 Since we can express a measure which is approachable from below 
or above as the limit of a sequence of measures with Lebesgue density, it has 
itself a Lebesgue density. Conversely, any finite measure which has a density 
which is Riemann integrable on any cube, is approachable from below and above. 

Proposition 2 Let [i denote a compactly supported probability measure that is 
approachable from below. Further let p £ [l,d/2). Then 

limsup^E^Oi,/^)] 1 /^^ (/.(d^) 1 

Proof. Let e > and choose a finite number of pairwise disjoint cubes 
Bi, . . . , Bk and positive reals at\, . . . , ax such that fi* := Ylk=i a kU(Bk) < M 
and ||// - fi*\\ < e. For k = 1, . . . , K let fj,V°) = U(B k ), set a = \\fi - fi*\\ and 
fix a probability measure //^°^ such that 

K 
k=0 

For each k, we consider empirical measures (/tn )n£N of a sequence of indepen- 
dent //^'-distributed random variables. We assume independence of the indi- 
vidual empirical measures and observe that for an additional independent multi- 
nomial random variable M = (Mk)k=o,...,K with parameters N and (ctk)k=o k 

one has 

K 
k=0 

We assume without loss of generality strict equality in the last equation. Set 
v = J2k =0 Anfy^*) and observe that by the triangle inequality 

E[p£(/z, Aa,)] 1/p < E[pP(//, i/)]Vp + E[p*(v, Ajv)] 1 ^. 

The first expression on the right hand side is of order 0{N~ 1 / 2p ), (see proof of 
Proposition [J]) . By Theorem [5] (i), there is a concave function <p : [0, oo) — > K 

such that E[npP(U([0, l) d ), WQM) rf )J] < ip(n) for all n G N and 

lim = 

Denote by ai, or- the edge lengths of the cubes B±,..., Bk and let ao > be 
such that the support of /t is contained in a cube with side length ao- Then, by 
Lemma [T] and Jensen's inequality, 

K 

NE[pP(^fi N )} < £>[Af fc A<£)] 

< ( K ™ h Y < E[M^ p/d } n<p(M k )] 

fc=i 
if 

fe=i 
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so that 

K 



limsup N p ' d E[pP(u, fa)] < (K c p uhc Y a p e 1 ^ + («™ if ) p T a{ 
Note that for x e Bk, f{x) := ^jf > a^/af, and we get 

Finally, we arrive at 

limsup NP/ d ElpP(n,fi N )} < ( K ; nif )f / f(x) 1 -P/ d dx + (^ ube f age 1 ^. 



N— s-oo 

Letting e — ?> the assertion follows. □ 

Proposition 3 Let ^ be a finite singular measure on the Borel sets of [0, l) d . 
For p G [l,d/2), one has 



lim N 1 '*V$f(j J )=Q. 

N— foo 

Proof. Without loss of generality we will assume that /i is a probability mea- 
sure. Let e > and choose an open set U C K d such that = 1 and 
X d (U) < e. We fix finitely many pairwise disjoint cubes £?i, . . . , Br- with 

£7 D Si U • • ■ U B K and ^{Bx U • • ■ U > 1 - e. 

We set B = [0, l) d \(£i U • • • U B K ) and define the probability measure as m 
Lemmadl by v :— Ylk=o u \Bk where 

Then the vector Z := (Nfj,]y(Bk))k=o,...,K is multinomially distributed with 
parameters N and {n{Bk))k=o,...,K- Hence, by Lemma[2l 

E[^( M ^)] 1/P < (^f P E E l^ -^(^)l) 1/P = O^- 1 ^). (19) 

fe=0 

We denote by oi, . . . , ax the edge lengths of the cubes Bk, i.e. = X d (Bk) 1 ^ d , 
and set oq = 1. Note that ^|s fc and /tAf|s fc have the same mass for all k. We 
apply Lemma [TJ Proposition [1] and Jensen's inequality to deduce that 



< { K ™ b YN- p/d Y l « p MBk)) 1 - p/d . 



fe=0 
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Next, we apply Holder's inequality with exponents d/p and (1 — p/d) 1 to get 

/ K \ P/ d / K \ 1 -P/ d 

+ (K c p uhc ) p v(B ) 1 ~ p/d N- p / d 
< (n c p uhc ) p (e p/d + e 1 - p,d ) N~ v ' d . 

It follows from (fTU)) and the triangle inequality that 

limsupiV 1 / d E[^(/i,/i A ,)] 1 / p < ^ub C ( £ p/d + £ i-p/dy/ P 

JV-s-oo 

which finishes the proof since e > is arbitrary. □ 

Theorem 4 Let p G [L f ) and let \i denote a probability measure on WL d with 
finite qth moment for some q > If the absolutely continuous part fi a of fi 

has density f which is approachable from below, then 

hmsupiV 1 / rf V^fiti) < C"' ( I f( x )^ td dx ) ^ ■ ( 2 °) 

N->oa \JR d / 

If the absolutely continuous part fi a of \i has density f which is approachable 
from above, then 



liminf N 1 " V^ d (») > k™* ( f fix) 1 -^ dx 



1/p 

(21) 



Proof. We only prove the first statement since the second one is proved anal- 
ogously (first establishing a corresponding version of Proposition [2]). Let 5 > 
and set 

(1) ^\b(o,s) (2) _ HjM , n da (3) - ^M2f 
' Ma(S(0,<5))' ^ A*.(B(0,tf))' ^ M(S(0,5) C )' 

where we let /iW be an arbitrary probability measure in case the denominator 
is zero. As in the proof of Proposition [51 we represent /tjv with the help of 
independent sequences of empirical measures (An )neN ) ■ • • , (An )neN an d an 
independent multinomially distributed random variable M — (Mk)k=i,2,3 with 
parameters N and (m<»(-B(0, 5)), n a (B(0, 5)), fi(B(0, S) c )) as 

3 



fc=i 



As before one observes that for the random measure v — J^ fe=1 ^-pP^ 
Further, by Lemma [1] 



3 

NE\pP p {u,M] < ^E[M^( M W,Ag)] 



fe=i 
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and, by Propositions [5] and [3J there exist concave functions tpi and ip2 with 

nV™ d {^Y <<p k (n), foxneN, k = 1,2 

and 

^(n) ~ (K- lf ) p n 1 -^ d / -J^^—dx and ^(») = o^ 1 "^) 

Jb(0,S) Mal-BlO,^)) 1 P/' 1 

as n — > oo. By Jensen's inequality, E[Mfc pP{p^ k \ )] < yjfe(E[Mfe]) so that 
Umsup— i-^EtM^^W.Ag)] < (C")" / /(^) 1 ' p/d dx. 

TV^oo iV 1 P/ a Jb(0,8) 

Analogously, using Proposition [3J 

J^Jl E[M 2 ^(m (2) , /i£ )] - 

and, by Theorem [3j 

limsup -jL^ElAfg^^, £&>)] < («£ j 00 )* [ / , _ d/^)]^ 



B(0,<5)<= 



where we used that 1 — g — | > 0. Altogether, we get 
limsupiV^EtpP^Aiv)] 

< (C if ) p / /W^dx + c^Tf/ bll^ ax d M (x 1J ' " 



B(0,<5) 



B(0,<5)<= 



and letting 5 — > oo finishes the proof. □ 

4.2 Proof of the high resolution formula for p = 1 

In this section, we consider the special case p = 1. We will write p instead of 
pi . The case p = 1 is special because of the following lemma. 

Lemma 4 Le£ //, v, n be finite measures on M. d such that \\p,\\ = \\v\\. Then one 
has 

p(p + k, v + n) = p(p, v). 

Proof. One has 

p(p + K, v + k) = sup{ J f d(/i + k) — J f d(is + k) : f 1-Lipschitz} 
= sup{ J f dp — J f dv : / 1-Lipschitz} = p{p 1 v). 

□ 

The following lemma shows that the map p H> limsupjy^,.^ (A 1 / d V^, a " d (/i)) 
and likewise p liminf jv->oo (A 1 / d V^ a " d (/i)) are continuous with respect to 
the total variation norm. 
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Lemma 5 Let d > 3 and q > -r^. for probability measures p and z/ on R 
one /ias 

limsup^iy^^-y^Mi < 2^™\\»-v\\ 1 - 1 «-i([ WxW^-uKdx)^. 

Proof. Without loss of generality, we assume that p, ^= v. Let a = j^j^, 

pf = ||^Z^ai/|| an< ^ ^* = iTv^A^I] a ^ e an arbitrary probability measure in 
case p A v = 0). For fixed AT £ N let (Mi, Ma) be multinomially distributed 
with parameters N and (||/iA f||,l — ||/xA f||). We represent /ijv and i>n as 
combinations of independent empirical measures (a n ), (p* n ) and (z>*) as 

N fi N = Mxa Ml + M 2 fj,*M 2 and N v N = Mia Ml + M 2 u* U2 . 

Then 

p(JVM, N£iv) < p(AT/i, Mia + M 2 /**) + p(M x a + M 2 p*,M iaMl + M 2 p\ h ) 
< p(Np, M t a + M 2 p*) + p{Mxa, Mi& Ml ) + p{M 2l i* ,M 2 p* M2 ). 

(22) 

Observe that 

E[p(Np,, Aha + Ma|i*)] = C(iV 1/2 ). (23) 
Further, by Theorem [3] and Jensen's inequality, one has 

E[p(M 2M *,M 2 /ik)] < ^ril/i-HI 1 -^^ 1 -^/ IWIL^ (M-^)+(dx)) "+0(7V^), 

(24) 

where we used that (/x — z/)+ = ||/x — /i*. Conversely, by Lemma HI and 
Lemma [TJ 

p(Mio, MiAmJ = p(M ia + M 2 v*M 2 ,M 1 aM 1 M 2 VM 2 ) 
= p(M 1 a + M 2 i>M 2 ,Ni> N ) 

< p{N Vl Nv N ) + p{M ia + M 2 v* M2 , Nv) 

= p{Nv, NP N ) + p(Mia + M 2 Pm 2 + M 2 v*,Nv + M 2 v*) 

< p(Nu,NP N )+p(M 2 Plf 2 ,M 2 p*) + p(Mia + M 2 v*,Np). 

The expected values of the last two summands can be estimated like (|2~4"]) and 
(|2U)) . Inserting the estimates into (|2"2"]l . the assertion of the lemma follows. □ 

We now prove the general upper and lower bounds in the case p = 1. 

Proof of Theorem [2] (ii) for p = 1. Let p = p a + p s be the Lebesgue 
decomposition of p and let / denote the density of p a . It is now straightforward 
to verify that pft 1 ' with density 



f n \x)=2- nd / f(y)dy forxeS n . 



m 1 ,...,m d , 



where 5„, mi ,..., md := 2 n ([m u mi + 1) x • • • x [m d ,m d + 1)), satisfies \\p a - 
p( n )\\ — > and / H^ll^ax \p a ~ p^\(dx) — > 0. Since p^ + p s is approachable 
from below and above, Lemma [S] allows to extend the upper and lower bounds 
of Theorem @] to the case with general density if p = 1. □ 
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